-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
apply: Use staged deployment on booted systems #298
Conversation
I tested this in a VM and it works. However, one thing I noticed that's lost is the repo pruning. The deployment is now written out in the shutdown path and ostree makes the decision not to trigger pruning there. I think we want to add a call to |
8bcc0eb
to
4e943ea
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, modulo the prune question.
Hmm, CI hit a test timeout, which I'd also seen in OBS. There's currently a 360 second timeout for the slower tests and I routinely see the flatpak install test take nearly 300 seconds on my fast laptop. I think I'll add a commit to bump that to 600 seconds. |
More detail in phabricator, but for pruning it appears that rpm-ostree just does the normal cleanup once the deployment is staged. Since the rollback deployment hasn't been deleted at that point (it's deleted when finalizing), you don't get the disk space back immediately. That would happen the next time you apply an update (with the current code). If we want to be more aggressive about reclaiming the disk space, we'd have to add another |
When finalizing an OSTree deployment, the current `/etc` is merged with the new commit's `/usr/etc`. Any changes that happen in the current `/etc` after the deployment has been finalized will not appear in the new deployment. Since eos-updater is often run in the background, it's likely the user will make changes in `/etc` (such as creating a new user) long before the new deployment is booted into. To address this issue, OSTree has provided the concept of a staged deployment since 2018.5. The new deployment is initialized but not finalized until shutdown via the `ostree-finalize-staged.service` systemd unit. Since staged deployments only work on OSTree booted systems that can initiate systemd units, this can't really work in the current test suite. The old full deployment method is kept for that case. Note that staged deployment finalization depends on the `ostree-finalize-staged.path` systemd unit being activated. Currently, OSTree does this on demand but in the future it may require the OS to explicitly activate the unit via a systemd preset or similar mechanism. https://phabricator.endlessm.com/T5658
On our CI and package builders `test-update-install-flatpaks` often exceeds the 360 second timeout. Even on my fast laptop it routinely takes nearly 300 seconds. Bump the timeout for slow tests to 600 seconds to ensure it has time to complete. https://phabricator.endlessm.com/T5658
An idea I haven't thought through:
|
4e943ea
to
a49c5e9
Compare
I'm sure the answer is "not easily" but can the Flatpak install test be made to take less than 300 seconds, perhaps with a crude implement such as |
I would like that since the test suite turnaround is slow for development. I need to figure out where it's taking so much time. I added the commit here which we can banter about. I also punted on the more aggressive pruning, which we can also discuss. |
I like this. It could be even simpler since |
My thinking was that don't want it to block the boot process, c.f. |
Ah, right. Especially since this has to run |
Another idea that's a little simpler to avoid
In order to not get it to block I actually think the above would be useful upstream as a complement to the existing |
Perhaps it should put the stamp file into
|
Good point about |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This branch LGTM. Do you want to wait until you have a version of the cleanup-on-next-boot logic implemented or land this and have that as a follow-up?
It looks like ostreedev/ostree#2510 is probably not imminent, so I'm going to do something similar downstream. I think I'm going to try to address it here, but if I get stuck I think we can merge this and fix the pruning in a follow up. |
I added a commit to handle cleanup with a drop-in for I tested this in a VM and it seems to DTRT. On my SSD system it took 5 seconds and didn't block gdm:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lgtm
|
||
# Only /sysroot and /boot need to be written to. | ||
ProtectSystem=strict | ||
ReadWritePaths=/sysroot /boot |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But if you can write to /sysroot
, what can't you write to on the main disk?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True. I guess the only way to really narrow this is to make it it truly pruning only and then you could limit it to /sysroot/repo
. ostree admin cleanup
does do other things in /sysroot/boot
, /sysroot/ostree/boot.*
and /sysroot/ostree/deploy
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While it's true that everything is really under /sysroot
, I don't believe you meant this as a blocker. It does mean that inadvertent writing to /etc
isn't possible. Unless you have objections, I think we should just carry on with this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed
When OSTree staged deployments are used, the old rollback deployment is deleted during system shutdown. To keep from slowing down shutdown, the OSTree repo is not pruned at that time. That means that even though the deployment was deleted, the objects are still on disk. Since that may be a significant amount of wasted disk space, the full cleanup with repo pruning needs to be run at some time after rebooting. See ostreedev/ostree#2510 for details. To detect when cleanup is necessary, a systemd drop in is added to touch the `/sysroot/.cleanup` file after `ostree-finalize-staged.service` has finalized the new deployment. The reason to use a drop-in for `ostree-finalize-staged.service` rather then creating the file from `eos-updater` is to avoid the situation where an unclean shutdown occurs and the new deployment is not finalized. In that case, cleanup would be run unnecessarily on the next boot. A new systemd service, `eos-updater-autocleanup.service`, is added to run `ostree admin cleanup` when `/sysroot/.cleanup` exists and then delete it afterwards. This adds a dependency on the `ostree` CLI but a separate program could be provided calling the `ostree_sysroot_cleanup` API and deleting the `/sysroot/.cleanup` file itself. https://phabricator.endlessm.com/T5658
b5c457e
to
9fa6754
Compare
Something I thought of later is that the |
When finalizing an OSTree deployment, the current
/etc
is merged withthe new commit's
/usr/etc
. Any changes that happen in the current/etc
after the deployment has been finalized will not appear in thenew deployment. Since eos-updater is often run in the background, it's
likely the user will make changes in
/etc
(such as creating a newuser) long before the new deployment is booted into.
To address this issue, OSTree has provided the concept of a staged
deployment since 2018.5. The new deployment is initialized but not
finalized during shutdown via the
ostree-finalize-staged.service
systemd unit. Since staged deployments only work on OSTree booted
systems that can initiate systemd units, this can't really work in the
current test suite. The old full deployment method is kept for that
case.
Note that staged deployment finalization depends on the
ostree-finalize-staged.path
systemd unit being activated. Currently,OSTree does this on demand but in the future it may require the OS to
explicitly activate the unit via a systemd preset or similar mechanism.
https://phabricator.endlessm.com/T5658